7 



Mahalanobis' distance disclosed in U.S Patent Number 4,991,216. 

**= 2 B ki -2A' ki -X (i) 

i,time 
where; 

* is inverse matrix of W , 
[Aft is transpose of a matrix of jl^ , 

Lfc is a distance between utterance of state (k) (phoneme order or 
1 0 time sequence) by a speaker and the trained pattern every category, 

\X k is an average value of LPC cepstral coefficient vectors of state (k) 
(phoneme order or time sequence) every category, 

j2 x is an average value of LPC cepstral coefficient vectors of all 
utterances by all training speakers, 

15 Wis a covariance value of LPC cepstral coefficient vectors of all 

utterances by all training speakers, and 

X is a continuous LPC cepstral coefficient vector of an input speech 
sound generated by a speaker. 

20 Using trained patterns for categories 1, 2, and 3, distances L^. , L 2 f c 

and Z, 3 £ are obtained in the following equations. 



i,time 



where; 

L 2k = 1 B 2ki -2 2^ -X 

i,time 
where; 

i,time 
where; 

fJ^. is an average value of LPC cepstral coefficient vectors of state 

(k) (phoneme order or time sequence) of an arrangement "Television" of 
speech sound elements for category 1, 

/^2£ is an average value of LPC cepstral coefficient vectors of state 

(k) (phoneme order or time sequence) of an arrangement "Television" of 
speech sound elements for category 2, 

J2^ k is an average value of LPC cepstral coefficient vectors of state 

(k) (phoneme order or time sequence) of an arrangement "Television" of 
speech sound elements for category 3, 

fA x is an average value of LPC cepstral coefficient vectors of all 

utterance by all training speakers, 

w is a covariance value of LPC cepstral coefficient vectors of all 
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utterances by all training speakers, and 

X is an LPC cepstral coefficient vector when a user speaks 
"Television". 

5 This distance calculation uses, as an entire distribution, all 

utterances by speakers in various characteristic categories as discussed above. 
Therefore, these equations are extremely effective for the selection of trained 
patterns to be selected. 

The present embodiment uses four words, "Television", "Video", "Air 

10 conditioner", and "Light", as defined pattern selection words. When the 
pattern selection words are also used for the device selection, a number of 
pattern selection words is preferably the same number as controlled devices. 
When the pattern selection and the device selection are performed using 
different words, a smaller number of pattern selection words can produce the 

15 same advantage. For example, when "Instruction" is used as a pattern 
selection word and "Instruction_Television_Increase-sound" or 
"Instruction_Light_Turn off is spoken, "Television" and "Increase sound", or 
"Light" and "Turn off are used as device control words. Even one pattern 
selection word can thus improve recognition performance of the subsequent 

20 words. 

Next, distances obtained in step S403 for the trained patterns for 
respective categories are compared with each other (step S404). In the 

present embodiment, distances , L 2 f c , and obtained in step S403 

are compared with each other. 
25 Based on the comparison result in step S404, a vocabulary indicating a 

controlled device and a category that have the shortest distance is selected 
(step S405). 

Patterns to be selected are reconstructed in response to the nearest 
pattern selected in step S405 (step S406). When the trained patterns of 

30 category 1 are selected in step S405, average values 311 of utterance by 
training speakers in category 1 and covariance values 321 of utterances by 
the training speakers in category 1 are used as the trained patterns to be 
selected. When the trained patterns of category 2 are selected in step S405, 
average values 312 of utterance by training speakers in category 2 and 

35 covariance values 322 of utterance by the training speakers in category 2 are 



